A Policy Search Method For Temporal Logic Specified Reinforcement Learning Tasks

نویسندگان

  • Xiao Li
  • Yao Ma
  • Calin Belta
چکیده

Reward engineering is an important aspect of reinforcement learning. Whether or not the users’ intentions can be correctly encapsulated in the reward function can significantly impact the learning outcome. Current methods rely on manually crafted reward functions that often requires parameter tuning to obtain the desired behavior. This operation can be expensive when exploration requires systems to interact with the physical world. In this paper, we explore the use of temporal logic (TL) to specify tasks in reinforcement learning. TL formula can be translated to a real-valued function that measures its level of satisfaction against a trajectory. We take advantage of this function and propose temporal logic policy search (TLPS), a model-free learning technique that finds a policy that satisfies the TL specification. A set of simulated experiments are conducted to evaluate the proposed approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Hierarchical Reinforcement Learning Method for Persistent Time-Sensitive Tasks

Reinforcement learning has been applied to many interesting problems such as the famous TD-gammon [1] and the inverted helicopter flight [2]. However little effort has been put into developing methods to learn policies for complex persistent tasks and tasks that are time-sensitive. In this paper we take a step towards solving this problem by using signal temporal logic (STL) as task specificati...

متن کامل

Temporal Difference and Policy Search Methods for Reinforcement Learning: An Empirical Comparison

Reinforcement learning (RL) methods have become popular in recent years because of their ability to solve complex tasks with minimal feedback. Both genetic algorithms (GAs) and temporal difference (TD) methods have proven effective at solving difficult RL problems, but few rigorous comparisons have been conducted. Thus, no general guidelines describing the methods’ relative strengths and weakne...

متن کامل

Continuous-action reinforcement learning with fast policy search and adaptive basis function selection

As an important approach to solving complex sequential decision problems, reinforcement learning (RL) has been widely studied in the community of artificial intelligence and machine learning. However, the generalization ability of RL is still an open problem and it is difficult for existing RL algorithms to solve Markov decision problems (MDPs) with both continuous state and action spaces. In t...

متن کامل

Reinforcement Learning of Fuzzy Logic Controllers for Quadruped Walking Robots

This paper presents a fuzzy logic controller (FLC) for the implementation of some behaviour of Sony legged robots. The adaptive heuristic Critic (AHC) reinforcement learning is employed to refine the FLC. The actor part of AHC is a conventional FLC in which the parameters of input membership functions are learned by an immediate internal reinforcement signal. This internal reinforcement signal ...

متن کامل

Transfer Learning for Policy Search Methods

An ambitious goal of transfer learning is to learn a task faster after training on a different, but related, task. In this paper we extend a previously successful temporal difference (Sutton & Barto, 1998) approach to transfer in reinforcement learning (Sutton & Barto, 1998) tasks to work with policy search. In particular, we show how to construct a mapping to translate a population of policies...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1709.09611  شماره 

صفحات  -

تاریخ انتشار 2017